-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add checkm2 #6542
base: main
Are you sure you want to change the base?
add checkm2 #6542
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent timing: One of my users just asked for the tool :)
Could contribute a data manager.
remove dbkey column rename tables
and add working output assertions as comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good from my side.
#The <version> column indicates the checkm2 version that generated the database | ||
|
||
# | ||
#diamond_db_1.0.2 Diamond database 1.0.2 /mnt/galaxyIndices/Checkm2_database/uniref100.KO.1.dmnd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a diamond DB?
If so, this is interesting ... should we have a general Diamond location file and DM? with some tag
for different tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. And I agree that it would be interesting.
But it would be good to know and store the diamond version that has been used to generate it, or? Seems difficult to find out from the sources. The tool just downloads the latest version from zenodo (and I could not even find the link). Let me check if diamond dbinfo
could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice:
> diamond dbinfo -d uniref100.KO.1.dmnd
diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Database format version = 3
Diamond build = 142
Sequences = 6518230
Letters = 2584051404
Should we do this? Add columns tool
, db_format_version
, diamond_build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need diamond_build? But yes, we should do that :)
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrovsky01 do you think you can work on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some digging, I found that Checkm2 doesn't actually work with all diamond databases. It has an internal checksum to make sure it's the specific one from the database download command:
as such, I think that while it would be good to have the general Diamond db data manager, having a specific one for checkm2 is also a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's go ahead .. I would say.
My feeling is that a general data manager would be too complex and multiple data managers writing to the same data table also seems confusing. Maybe it's better to have tools load multiple data tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing around with the writing basically extra labels, but that requires someone on the other end parsing the table. Also, depending on the tool, you start to get additional bloat in the requirements. At the very least, checkm2's tool would require its conda package, and that's just extra dependencies for the other tools, even when you don't need it. I think it's a good idea conceptually, but maybe not for this case, specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? I do not understand what you want to say.
writing basically extra labels
What labels?
Also, depending on the tool, you start to get additional bloat in the requirements.
How? We would just load another datatable - we need to requirements for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just mean that if we start to have tools that share the table but can't use all of them. Just referring to the tool
label you mentioned at the top of the thread.
And I'd meant requirements for the data manager itself. In this case, you'd need the checkm2 conda package on top of the diamond package, as opposed to just the diamond package in the diamond_build_db data manager that already exists. If other tools operate similarly to checkm2 in the future, they'd need their conda package added to the data manager's xml
FOR CONTRIBUTOR: